Handling large XML documents in Process Platform - Streaming XML content |
|
Certain business scenarios involve processing large volumes of data as part of executing a business process instance. In such cases, instead of processing all data in a single, performance-intensive step, it is better to handle data, represented as an XML file, as a steady and continuous stream. This translates into a need to provide control over the manner in which data is processed by the Process Engine.
By this approach, you will have the following benefits:
- Less memory consumption
- Increased performance
Salary Processing of Railway Employees
Let as assume a scenario of salary processing of employees in a government organization like railways. They perform this in a batch process. They will have the employee information in a legacy system and for salary processing they will query the legacy system and create a batch file. These batch files are huge in size (In GB). In the business process model, for every employee, you have to get his/her work details, process the salary and generate the pay slip.
Currently, you cannot stream the XML data from a file into the Business Process Management Service container. So, you have to load the file completely into the Business Process Management Service container memory which will eventually cause a out of memory problem. In order to avoid this, as a developer, you have to convert the employee records in the XML batch file into records in a table, generate the query method and use these methods in the business process model. Developing, packing and deploying batch processing models is a tedious and time-consuming approach.
Streaming Support
- Streaming support in XPath: XPath now provides a streaming API. Using this API, you can navigate through a collection of records without loading the entire XML into memory.
- Streaming support in the For Each construct
- In the For Each BPMN construct, you need to provide an XPATH expression which returns a collection of XML Nodes (Records) on which you iterate through. This For Each construct is now extended to support streaming XPATH Expression.
- The For Each construct works on the Normal mode and Streaming XPATH Mode.
- When a streaming XPath mode is selected, you are required to provide the file URL via a message in message map to the For Each construct.
- During the runtime, the engine resolves the URL from the message, creates an XPathReader with the URL and streams the records from the file one by one for each iteration.
- By default, after each interaction, you can delete the streamed record. However, if you want to maintain the record, you are provided with an option to do that.
- In case of crash recovery, you can start streaming from the place you were when the crash happened.